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,  ABSTRACT  X  5'i/b  ^ 

^  1  I  .V  V- . . . ^ 

For  k  ^2  independent  binomial  populations,  from  which  4-  B{ni,  %),  i  =  1, . . .  ,k, 
have  been  observed,  the  problem  of  selecting  the  population  with  the  largest  ^s^alue  and 
simultaneously  estimating  the  fi-parameter  of  the  selected  population  is  considered.  Under 
several  loss  functions,  Bayes  decision  rules  are  derived  and  studied  for  independent  Beta- 
priors.  A  fixed  sample  size  look  ahead  procedure  is  also  considered.  A  numerical  example 
is  given  to  illustrate  the  performance  of  the  procedures.  j2_  ^  ^  — 
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1.  Introduction 


Let  k  >  2  binomial  populations  be  given,  from  which  independent  observations 
Xi  ~  B{ni,  9i),  i  =  1, . . . ,  fc,  have  been  drawn,  where  ni, . . . ,  n*  are  assumed  to  be  known. 
Suppose  we  want  to  find  the  population  with  the  largest  success  probability,  i.e.  0-value, 
and  simultaneously  estimate  the  parameter  6  of  the  selected  population. 

All  results  in  the  vast  literature  on  ranking  and  selection  axe  separate  treatments  of 
either  one  of  the  two  decision  problems,  except  two.  Cohen  and  Sackrowitz  (1988)  have 
presented  in  their  paper  a  decision-theoretic  framework,  but  derived  results  only  for  k  =  2 
normal  and  uniform  distributions  and  ni  =  n2.  Gupta  and  Miescke  (1990)  have  extended 
these  results  for  normal  populations  to  A:  >  2,  not  necessarily  equal  sample  sizes 
and  to  a  larger  class  of  loss  functions. 

Estimating  the  mean  of  the  selected  population  has  been  treated  in  the  literature  so 
far  only  under  the  assumption  that  the  “natural”  selection  rule  is  employed,  which  selects 
in  terms  of  the  largest  sample  mean,  i.e.  in  the  present  framework  in  terms  of  the  leirgest 
Xi/ni,i  =  1, . . . ,  fc.  Further  discussions  and  references  can  be  found  in  Gupta  and  Miescke 
(1990). 

It  is  well  known  by  now  that  the  “natural”  selection  rule  does  not  always  perform 
satisfactorily  imder  nonsymmetric  models.  It  is  more  reeisonable  to  incorporate  loss  due 
to  selection  and  loss  due  to  estimation  in  one  loss  function  and  then  let  both  types  of 
decision,  selection  and  estimation,  be  subject  to  risk  evaluation.  Rather  than  “estimating 
after  selection”,  the  decision  theoretic  treatment  leads  to  “selecting  after  estimation”,  as 
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has  been  pointed  out  by  Cohen  and  Sackrowitz  (1988).  This  will  be  shown  in  Section 
2  in  a  general  framework.  Bayes  rules  for  independent  Beta-priors  will  be  derived  and 
studied  in  Section  3,  and  a  fixed  sample  size  look  ahead  procedure  is  the  topic  of  Section 
4.  A  numerical  example  from  Abughalous  and  Miescke  (1989)  will  be  reconsidered,  under 
the  present  situation,  at  the  ends  of  Sections  3  and  4  to  illustrate  the  performance  of  the 
procedures  derived. 

2.  General  Framework 

Let  X  =  (Jfi , . . . ,  JCfc)  be  a  random  vector  of  observations  where  A”,  ~  B{ni,6i),  i  = 
1 . . .  ,k,  are  independent  binomially  distributed  with  known  ni, . . . ,  n*,  and  unknown  pa¬ 
rameters  di,...  ,6k  in  the  unit  interval.  The  likelihood  function  is  thus  given  by 

ik  *  /  \ 

(1)  /(ii«) = = n  r-' ‘<1  - 

i=i  1=1 

where  Xi  €  {0, 1,. . .  6  [0,1], i 

The  goal  is  to  select  that  population,  i.e.  coordinate,  which  is  associated  with  = 
max{0i, . . .  ,0*},  and  to  simultaneously  estimate  the  0- value  of  the  selected  population. 
Since  Bayes  rules  axe  the  main  topic  of  this  paper,  only  nonrandomized  decision  rules  need 
to  be  considered,  which  can  be  represented  by 

(2)  d(sc)  =  (5(i),^,(r)(x)),x,  6  {0,l,...,n,},i  =  l,...,k, 

where  s(i)  €  (1, 2, . . . ,  A:}  is  the  selection  rule,  and  where  £i(x)  E  [0, 1],  t  =  1, . . . ,  k,  is  a 
collection  of  k  estimates  of  j  =  1, . . . ,  Ar,  respectively,  available  at  selection. 

The  loss  function  is  assumed  to  be  a  member  of  the  following  class 

(3)  Lilis,e,))  =  A{6,s)+B{6,s)[d,-e,]\ 
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which  represents  the  combined  loss  at  6,  if  population  (i.e.  coordinate)  s  is  selected  and 
is  used  as  an  estimate  of  6 a.  Two  special  types  of  loss  functions  will  be  considered  later 
on  in  connection  with  conjugate  Beta-priors.  The  first  is  called 

Additive  Type: 


Aiits)  =  d[k]  -  6a,  or  A2{Is)  =  6J%1  -  daf,C,d>  0. 

Bi{d,s)  =  p,  or  B2{6,s)  =  p[^a(l  -  6a)]~^ ,p  >  0. 

Hereby,  any  choice  of  Ai  or  A2  represents  loss  due  to  selection,  Bi  controls  the  relative 

importance  of  selection  and  estimation,  and  B2  adjusts  also  the  precision  of  the  estimate 

ia  to  the  position  of  dg  in  [0,  1].  A  justification  of  the  latter  will  be  given  later.  The  second 

type  is  called 

Multiplicative  Type: 


A^its)  =  0,  and  =  6J^{1  -  6aY,c,d>  0. 

Hereby,  any  choices  of  loss  due  to  selection,  relative  importance  of  selection  and  estimation, 
and  adjustment  (or  non-adjustment)  of  the  precision  of  the  estimate  to  the  position  of  the 
paxameter  is  represented  by  the  two  parameters  c  and  d. 

In  the  Bayes  approach,  let  the  vector  of  k  unknown  peirameters  be  random  and  de¬ 
noted  by  Q_.  The  prior  is  assumed  to  have  a  density  n{6),6  G  [0,1]*,  with  respect  to 
the  Lebesgue  measure,  with  posterior  density  denoted  by  7r(^|x)  and  marginal  posterior 
densities  'iri{6i\£),i  =  1,...  ,k.  In  the  latter,  index  i  at  TTi  will  be  suppressed  for  simplicity 
whenever  it  is  clear  from  the  context  what  is  meant. 
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As  has  been  mentioned  in  the  Introduction,  the  decision  theoretic  treatment  of  the 
combined  selection-estimation  problem  leads  to  “selection  after  estimation”,  which  wais 
first  pointed  out  by  Cohen  and  Sackrowitz  (1988).  Similar  to  Lemma  1  in  Gupta  and 
Miescke  (1990),  the  following  extension  can  be  seen  to  hold. 


Lemma  1.  Let  £*(x)  minimize  E{B(^,i)[0i  —  for  ii  E  [0,  l],i  =  1, . . . ,  k. 

Furthermore,  let  3*(i)  minimize  £^{A(^,i)  -h  B(^,z)[0i  —  £*(x)]‘^\X  =  x},i  =  1,. . .  ,k. 
Then  the  Bayes  rule,  at  2L  =  £.i  «  d*{x)  =  (s*(i),  £*.(^(x)). 

We  can  get  some  steps  further  ahead  toward  finding  the  Bayes  rule  explicitly  under 
a  loss  of  the  additive  or  multiplicative  type  and  independent  Beta  priors,  if  we  restrict 
considerations  to  those  situations  X,  =  x,  where  B{9,i)ir{9\x)  is  integrable  on  [0,1]*'  and 
has  second  moments  of  6,,i  =  1, . . . ,  fc.  Cases  where  this  does  not  hold  will  not  cause  any 
major  problems.  They  occur,  if  at  all,  at  the  lower  ends  of  the  ranges  of  Ai, . . . ,  Afc.  For 
i  =  1, . . . ,  A:,  let 

(4)  Tri(9i\x)  =  Ti(9i\x)/  I  Ti{fi\x)dfi,  where 

Jo 

''"•(^ilx)  =  [  B{9,i)w{9\x)d9,  and 

7(0,1]*-* 

=  (^1 , . . . ,  9i-i ,  ,...,9k). 


Then  we  can  state  the  following  result. 

Theorem  1.  At  every  2L  =  S.)  for  which  7r,(^i|x)  exists  and  has  second  moments, 
i  =  l,...,fc,  the  Bayes  rule  ^{x)  satisfies  ^*(x)  =  B"^’^  l'^(0i), i  =  1,...,A:,  and  s*(x) 
minimizes  B'(li)(A(^,  i))  +  i))  Far'-(l^(0i),  i  =  l,...,k. 

Proof:  Suppose  that  at  2C  =  S.)  the  i-th  population  is  selected.  Then,  by  Lemma  1,  £*{x) 
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has  to  minimize 


(5)  £:{5(0,O[0i-^.]'l2C  =  x} 

as  a  function  of  £{  6  [0,1].  If  now  exists,  the  conditional  expectation  (5)  can  be 

written  as 

(6)  /  [di-ei]'^fi{e,\x)dei  f  B{e,i)n{e\x)d9. 

Thus,  if  7ri(^ijx)  has  a  second  moment,  the  minimum  of  (6)  as  a  function  of  6  [0, 1] 
occurs  at 

(7)  £:{x)  =  /  ei7ri(6,\x)dei  = 

and  the  minimum  of  (6)  turns  out  to  be  the  product  of  the  variance  of  iri(-|x)  with  the 
expectation  of  under  7r(-lx). 

Once  the  best  estimate  ^*(x)  has  been  found  for  a  possible  use  in  connection  with 
selection  of  the  i-th  population,  i  —  the  optimum  selection  s*(x)  minimizes  the 

sum  of  the  expectation  of  under  7r(-|x)  and  the  product  of  the  variance  of  7ri(-|x) 

with  the  expectation  of  B{Q_,  t)  under  7r(-|x),  i  =  1, . . . ,  fc.  This  completes  the  proof  of  the 
theorem. 

Remark  1.  As  we  have  seen,  the  proof  of  Theorem  1  proceeds  componentwise.  Thus, 
this  approach  can  be  used  aJso  in  other  situations  where  the  zissumptions  of  the  theorem 
are  fulfilled  only  for  some  i  6  M(x)  C  {1,. .  .,k},  say.  For  these  populations  i  6  Af(x), 
one  may  just  proceed  as  in  the  proof.  On  the  other  hand,  for  every  j  ^  M(x),  one  has  to 


6 


find  by  minimizing,  as  a  fimction  of  tj  €  [0, 1], 


(8)  / 

which  gives  fy(x),  PJid  to  use  its  minimum  value  as  a  substitute  for  the  non-existing 
product  of  the  variance  of  Trj(-\x)  with  the  expectation  of  under  7r(-|x)  in  the  final 

minimization  step  that  leads  to  s*(x),  i.e.  the  optimum  selection. 

Remark  2.  It  should  be  pointed  out  that  all  optimum  estimates  f*(x), i  = 
considered  are  the  usual  Bayes  estimates  if  selection  is  ignored  and  estimation  is  restricted 
to  one  population  at  a  time. 


3.  Bayes  Rules  for  Beta>Priors 

In  this  section,  we  will  derive  the  Bayes  rules  d*(i)  explicitly  and  discuss  their  prop¬ 
erties,  assuming  the  loss  (3)  is  of  the  additive  or  multiplicative  type  and  that  a  priori, 
01, . . .  ,0jfc  axe  independent  and  follow  k  given  Beta  distributions.  To  receill  briefly  some 
well-known  facts,  a  random  variable  0  is  Beta-distributed  with  pareuneters  q,^  >  0,  if  it 
has  the  density 

(9)  =  -«)«-■,«€  [0,1). 

Its  expectation  and  variance  are  given,  respectively,  by 


(10) 


E'(0) 


- r  and  Var’'(0)  = 

a  +  ^  ^  ’ 


a/? 

(a -H  ^)2(a -I- -I- 1)’ 


This  family  of  Be(a,  ^),  a  >  0,  /3  >  0,  is  conjugate  to  the  binomial  family,  since  the  posterior 
distributions  are  again  of  the  Beta-type.  More  precisely,  if  0  ~  I3e(a,$)  eind  X,  given 
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0  =  is  B{n,  ff),  then  0,  given  X  =  x,  follows  a  Be(a  +  x,  0  +  n  —  x)  distribution,  which  is 
called  the  posterior  of  0  at  A"  =  x,x  G  {0, 1, . . .  ,n}.  If  finding  the  Bayes  rule  is  our  only 
concern,  the  margined  distribution  of  X  does  not  need  to  be  considered.  It  is  only  relevant 
for  averaging  posterior  expected  loss,  i.e.  posterior  risk.  Although  this  will  not  be  done 
before  Section  4,  let  us  give  it  already  here  for  the  sake  of  simplicity  and  completeness. 
The  probability  that  X  is  equal  to  x  is  given  by 


(11) 


m(x) 


/n\  r(a  -h  0)  r(Q  -I-  x)r(0  +  n-  x)  _ 
Vxy/ r(a)r(/9)  r(a  +  /3  +  n)  ’ 


which  is  a  Polya- Eggenberger  distribution  with  the  four  parameters  n,a,l3,  and  1,  cf. 
Johnson  and  Kotz  (1969),  p.  230.  Its  expectation  and  variance  ^lre  given,  respectively,  by 


(12) 


E(X)  =  n— ^  and  Var  (A)  = 
^  ’  a  +  /3 


na0{a  +  ^  +  n) 

(a  -f  /?)^(a  +  P  -i-l)' 


Finally,  it  should  be  mentioned  that  three  of  the  four  noninformative  priors  presented 
in  Berger  (1985),  p.  89,  fit  into  our  present  framework:  The  uniform  distribution  Be(l,  1). 
which  mahes  the  marginal  distribution  of  A  uniform  as  well,  the  proper  prior  fie  (|,  ^), 
and  the  improper  prior  which  one  gets  as  a  limit  of  Be{a,0)  as  a  and  ^  tend  to  zero,  i.e. 
the  function  it{6)  =  [^(1  —  ^)]~^,  which  is  not  integrable  on  the  unit  interval  but  can  be 
used  to  derive  generjdized  Bayes  rules. 

After  these  preliminary  considerations,  we  arc  now  ready  to  derive  and  study  the  Bayes 
rules  in  the  given  framework.  Let  the  likelihood  function  be  given  by  (1),  amd  aissume  that 
a  pnori  0^  ~  fie(ai,  =  1, . . . ,  A:,  aie  independent,  where  the  a’s  and  l3's  are  all  known. 
It  follows  then  that  a  posteriori,  given  A  =  x,  0,  ~  fis(a,  -|-  x,,  /?,  -b  nj  —  x,),  i  =  1, . . .  ,k, 
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Eure  iridependent.  Moreover,  in  the  marginal  distribution  of  X,  Jfi , . . . ,  Xk  are  independent 
and  P{Xi  =  1,}  is  given  by  (11)  with  n, r,a,  and  0  indexed  by  i,i  =  I, . . .  ,k. 

First,  let  us  study  the  Bayes  rules  for  losses  of  the  additive  type.  Among  other 
interesting  facts  we  shall  see  that  Bi  has  an  undesirable  effect  for  large  values  of  p  which 
makes  then  B2  preferable,  and  that  Ai  leads  to  the  same  Bayes  rules  as  A2  with  c  =  0  and 
d  =  1.  It  is  natural  to  consider  the  simplest  situation  at  the  beginning  which  is  the  case 
of  A  =  and  B  =  fli  in  (3),  i.e.  the  loss  function 

(13)  misj,))  =  0[k]  -0,+  p[9,  - 


Lemma  1  is  sufficient  to  find  here  the  Bayes  rule  d*(r)  conveniently  since  B(9,  i)  =  p.  The 
optimum  estimates  are  found  to  be  £*(x)  =  (a,  +  x,)/(a,  +  /3,  +  n,),  i  =  1, . . . ,  Ar,  the  usual 
Bayes  estimates  for  the  single  component  estimation  problems  under  squared  error  loss, 
and  s*(r)  minimizes  for  i  =  1, . . . ,  k, 


(14) 


f:{e|*|ir.  =  i}-<:u)  + 


oti  +  0i  +  + 


-£:{x)[i  -  £:{x)]. 


The  undesirable  effect  mentioned  above  comes  from  the  fact  that  the  posterior  variance  of 

e„i  €  {1,. . .  ,fc}, 


(15) 


Var"(  li)(0  )  =  (a.+x,)(/?.  +n. -X.) 

{a,  +  l3t  +  n,f(a,  +  lii Hi  +  1) 

=  (q,  +  0,+  n,  +  l)"V*(i)[l  -  f*(i)]. 


decreases  as  f*(i)  moves  away  from  0.6  in  either  direction.  This  causes  a  similar  beh^’vior 
of  (14)  if  p  >  Oj  -f  +  nj  +  1.  In  such  a  case,  if  all  k  estimates  are  close  to  zero,  s*(x) 
would  favor  smaller  estimates  because  of  a  smaller  posterior  risk  due  to  estimation. 
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From  (14)  one  can  see  also  that  the  term  £^{0[jt]|2C  =  £}  has  no  influence  at  all  on 
the  determination  of  the  Bayes  rule.  It  could  as  well  be  replaced  by  1,  i.e.  Ai  could  be 
replaced  by  A2  with  c  =  0  and  d  =  1  in  the  loss  function  without  any  change  in  the  Bayes 
rule  d*{x). 


The  next  case  to  be  considered  is  a  loss  function  which  combines  A  =  Ai  and  B  =  B2 
in  (3),  i.e. 


(16)  -  e,  +  p[9,{i  -  -  i.?. 


We  do  not  yet  replace  by  1  since  in  Section  4,  this  loss  function  will  be  used  to  derive  a 
fixed  sample  size  look  ahead  procedure,  where  it  is  not  obvious  from  the  beginning  that  A  = 
A2  with  c  =  0  and  d  =  1  gives  the  same  rule  as  ^4  =  Ai  does.  To  simplify  the  presentation, 
let  us  first  look  at  the  Bayes  rules  for  >  1  and  0i  >  l,i  =  1,. . .  ,k.  To  apply  Theorem  1, 
one  can  see  from  its  definition  (4)  that  7rj(-|i)  is  a  Ss(ai+Xj  — l,/3i+n,  — Xj  — l)-density,  and 
therefore  the  optimum  estimates  are  ^*(x)  =  (aj  +  x,  —  l)/(a,  +  (5,  +  n;  —2),i  =  1, . . . ,  fc. 
The  variance  of  0^  under  7fi(-|x)  is  readily  avmlable  from  (10),  and  the  expectation  of 
[0i(l  —  0i)]~^  under  7r(-|x)  is  found  from  (9)  by  manipulating  the  normalizing  factor  of 
the  eissociated  Beta-density.  Finedly,  s*(x)  is  found  to  minimize 


(17) 


^:{0[ife]|2C  =  x} 


Oi  -I-  Xi  —  1 

Off  +  -h  Uj  —  2 


+ 


p 

oti  A  A  Tii  —  2' 


or,  equivalently,  to  maximize  (a^  -t-  x,  —  1  —  p)/(a,  -|-  /3,  +  —  2), z  =  1,. . .  ,k.  For  the 

noninformative  prior  with  =  /?;  =  1,  z  =  1, . . . ,  fc,  the  Bayes  rule  turns  out  to  have  the 
following  simple  and  appeading  form:  i*{r)  —  Xifn,,i  =  1,...,A:,  and  s*(x)  maximizes 
(xi  —  p)/ni,i  =  1, . . .  ,fc,  i.e.  the  Bernoulli  sample  means  adjusted  for  their  precisions  due 
to  sample  sizes. 
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Adjustments  for  the  general  case  of  positive  a’s  and  ^’s  are  to  be  made  as  follows.  If 
Qi  <  1  and  Xi  =  0,  then  £*(x)  =  0,  and  the  value  of  (17)  for  that  particular  i  changes  to 
■£^{0[fc]12C  =  ^}  +  +  ni  -  1).  Similarly,  if  <  1  and  i,  =  n,,  then  ^*(r)  =  1,  and 

the  value  of  (17)  changes  to  JS{0(fc]|2C  =  i}  -  1  +  P0i/{oii  +  -  1)  for  that  particular 

i  6  {1,...,!:}. 

The  last  and  most  general  case  of  an  additive  type  loss  function  is  a  choice  of  A  =  A2 
in  (3),  combined  with  B  =  B\  ov  B  =  B-z-  In  view  of  Theorem  1  cind  the  results  derived 
so  far  concerning  B,  what  remains  to  be  found  is  the  conditioned  expectation  of  A2(0,  i) 
at  2L  =  X,  for  i  =  1, . . .  ,k.  Standard  calculations  show  that  for  i  =  I, . . .  ,k, 

(18)  E{A2(^,i)\X  =  x} 

~  r(Qi  +  3^1  -  c)T{^i  +  +  d)r{ai  +  +  Uj) 

r(ci(j  +  0i  +  ni+d-  c)r(aj  +  x,)r(/?j  +  Hi  -  Xi) 

if  Oj  +  li  >  c,  whereas  it  is  infinity  if  +  Zj  <  c.  Thus,  the  Bayes  rule  ^(z)  exists  if 

Xi  >  c  —  ai  for  at  least  one  i  ^  {1, . . . ,  The  latter  is  guareinteed  for  all  z,  if  c  <  for 

at  least  one  i  G  ,fc}.  The  explanation  of  the  possibility  of  a  nonexistent  Bayes  rule 

for  ai , . . . ,  a*  <  c  is  quite  simple.  Obviously,  A2  does  not  only  favor  selection  of  ^-values 

close  to  their  maximum,  i.e.  but  it  requires  the  selected  ^-value  to  be  large,  i.e.  close 

to  one. 


For  the  special  of  c  =  d  =  A,  where  A  is  an  integer,  (18)  reduces  to 


=  x}  =  n  ^  _  i|  _ 

“-1  Oli+Xi-  j 


provided,  of  course,  that  Oj  +  z^  >  A,i  =  1, . . . ,  fc.  For  this  case,  A2(fi,s)  =  (^,  ^  —  l)'^, 


and  A  =  1  is  seen  to  lead  to  the  same  Bayes  rule  as  A2  with  c  =  1  and  d  =  0.  The  Bayes 
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rule  for  the  loss  function 


(20) 


HLis,e,))  =  +p[«,(i  -  «,)]-'[e,  -  t.f 


employs  the  estimates  =  1,...  ,fc,  which  eu-e  given  above  of  (17),  and  s*(x)  mini¬ 

mizes,  for  i  =  1, . . . ,  fc, 


(21) 


I  P 

Oi  -h  Xi  —  1  ai  +  I3i  +  Hi  —  2 


In  particular,  for  the  noninformative  prior  with  a,  =  =  1,  i  =  we  have 

^*(i)  =  Xi/ni,i  =  1, . . . ,  fc,  and  s*(x)  minimizes  (n^  +  l)/xi  -I-  p/ni,i  —  1,.. .  ,k. 


Another  special  case  of  interest  is  A  =  A2  with  c  =  0  and  d  =  2  combined  with 


B  =  Si,  i.e. 


(22) 


Here  we  have,  as  before  with  (13),  ^*(x)  =  (a^  +  x,)/(ai  +  /?;  -f  n^),  i  =  1, . . . ,  fc.  Instead 
of  (14),  however,  s*(x)  minimizes  this  time  for  i  =  1, . . . ,  fc. 


(23) 


p-{ai  +  ^i  +  n,) 
+  ^»  +  1 


[1 


It  is  interesting  to  note  that  if  in  (14),  p  is  replaced  by  p~  (o,  +  /?i  +  n^),  *  =  1, . . . ,  fc,  then 
the  minimization  criterion  becomes  exactly  that  one  of  (23). 


At  the  end  of  this  section,  Bayes  rules  for  loss  functions  of  the  multiplicative  type  will 
be  studied.  Let 


(24) 


=  «r'(i  -  0. 
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To  apply  Theorem  1,  where  now  A  =  0  and  moreover  B{6,s)  —  —  daY,  one  can  see 

from  its  definition  (4)  that  ir,(-lx)  is  a  fie(oi  +  Xi  —  c,  l3i  +  rii  —  x,  +  d)-density  whenever 
Oi  +  Xj  >  c,  i  =  1, . . . ,  fc.  Thus  from  (10)  it  follows  that  ^*(x)  =  (a^  +  Xi  —  c)/(a,  +  I3i  + 
Tii  +  d  —  c),  if  ai  +  X,  >  c,  and  one  can  see  easily  that  £*(x)  =  0,  otherwise,  i  =  I,. ..  ,k. 


For  a,  +  X  i  >  c,  the  expectation  of  under  7r(-|x)  can  be  found  by  manipulating 

normalizing  factors  of  the  associated  Beta-densities,  and  the  variamce  of  0^  under  7r,(  lx) 
is  provided  by  (10).  Finally,  the  product  of  the  two,  which  enters  the  minimization  step  of 
•s*(^)j  turns  out  to  be  the  following. 


(25) 


_ 1 _ 7(0, ■  +0i+  ni)r(ai  -f  X,-  -  c  -t-  l)r(^,-  -f-  n,  -  Xj  +  d  +  1) 

ai  +  +  m  +  d  -  c  r(oi  -I-  x,)r(^i  +Tii-  Xi)T{ai  +  I3i  +  rn  +  d  -  c  +  2) 


If  for  some  i  G  {!,...,  k},ai  -f  Xj  <  c  —  2,  then  the  value  in  (25)  has  to  be  replaced  by 
infinity.  And  if  c  —  2  <  Oj  +  Xj  <  c,  then  the  replacement  value  equals 


(26) 


r(ai  +  0i+  ni)r{ai  +  Xj  -  c  -f  2)T{0i  -t-  -  x,-  -b  d) 

r(ai  4- Xi)r(/?i  -t-  Hi  -  x<)r(ai  +  l3i  +  ni  +  d-c  +  2) 


Only  one  of  several  interesting  special  C2ises  will  be  considered  for  brevity.  For  c  =  0 
and  d  =  2,  we  have  ^*(i)  =  {ai  -f  Xi)/{ai  -|-  n,  2),  i  =  1, . . . ,  fc.  And  since  -|-  x^  >  c 
is  always  fulfilled  here,  (25)  is  used  in  the  minimization  step  of  3*(i)  for  eill  i  =  1, . . . ,  fc. 
Especially  for  the  noninformative  prior  with  o,  =  0i  =  l,i  =  l,...,Ar,  3*(x)  is  seen  to 
minimize  the  following  for  t  =  1, . . . ,  k, 

(xj  -t-  l)(nj  +  1  -  Xi)(n,  +  2-  Xi)(ni  +  3-Xi) 

^  ^  (ni  +  2)(ni+3)(n,-i-4y(ni  +  5) 


Selecting  the  largest  of  k  success  probabilities  without  estimating  the  selected  param¬ 
eter  ffa  has  been  treated  previously  by  Abughalous  and  Mieseke  (1989).  Some  fundamental 
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properties  of  the  Bayes  selection  rule  have  been  shown  there  to  hold  under  all  permutation 
symmetric  priors  and  for  all  monotone,  permutation  invzuriajit  loss  functions.  Among  oth¬ 
ers,  one  is  that  population  i  is  preferred  over  population  j  if  Xj  >  Xj  and  rii  —  Xi  <  rij  —  xj 
holds  simultaneously  with  at  least  one  strict  inequality.  These  properties  are  lost  when 
estimation  is  incorporated  in  the  loss  function,  as  in  the  present  study. 

To  conclude  this  section,  let  us  consider  a  numerical  example.  As  in  Abughalous  and 
Miescke  (1989),  assume  that  fc  =  3  types  of  games  axe  examined,  where  game  1  has  been 
played  n\  =  20  times  with  xi  =  9  wins,  game  2  has  been  played  712  =  40  times  with  X2  =  18 
wins,  and  game  3  has  been  played  na  =  60  times  with  X3  =  27  wins.  Apparently,  the 
winning  rate  is  0.45  in  all  three  game  types  and  it  is  not  clear  from  this  information  alone 
which  of  the  three  is  preferable.  Suppose  in  the  following  that  Oj  =  /dj  =  1,  i  =  1, . . . ,  it. 

Under  loss  (13),  a*(x)  =  1  and  ^*(x)  =  0.4545  whenever  p  <  0.4281,  whereas  s*(x)  =  3 
and  ^3(x)  =  0.4516  otherwise.  Under  loss  (16)  as  well  as  (20),  s*(x)  =  3  for  all  values 
of  p  and  ^^(x)  =  0.45.  Under  loss  (22),  5*(x)  =  3  for  all  values  of  p  but  ^3(1)  =  0.4516. 
Finally,  under  loss  (24)  with  c  =  0  and  d  =  2,  one  gets  5*(x)  =  3  and  ^5(x)  =  0.4375. 

At  the  end  of  the  next  section,  where  a  fixed  seimple  size  look  ahead  procedure  is 
derived,  this  example  will  be  considered  again. 


4.  A  Fixed  Sample  Size  Look  Ahead  Procedure 

The  question  considered  in  this  section  is,  whether  it  is  worthwhile  to  take  additional 
observations  after  having  observed  Xi  ~  B{ni,9i),  i  =  1, . . . ,  fc,  if  the  loss  is  of  the  type 
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(16),  augmented  by  costs  for  sampling.  Let 


(28) 


-d,  +  p[6,{i  -  - isV  +  iN, 


where  iV  =  ni  +  . . .  +  and  7  is  the  cost  of  observing  one  Bernoulli  variable.  Let  a  prior 
0i  ~  Be(aj,/3,),  i  =  1, . . . ,  fc,  be  independent,  where  for  simplicity  of  presentation  >  1 
and  /?,•  >  1,  f  =  1, . . . ,  k,  is  assumed. 

If  no  further  observations  axe  taken,  the  Bayes  decision  is  described  below  of  (16),  and 
the  posterior  Bayes  risk  is  the  minimum  of  (17)  for  i  =  1, . . . ,  fc,  i.e. 


(29) 


-E'{0[*]|2C  =  .  max^ 


aj  +  Xj  —  p  —  1 
ai  +  ^i  + Hi -2 


+  'yN. 


Suppose  now  that  we  consider  taking  additional  observations  Yi  ~  B(mi,  Oi),  i  =  1, . . . ,  k, 

which  are  mutually  independent  and  independent  of  Xi, . . . , X*.  The  posterior  expected 

risk,  at  is  seen  to  be 

E{E{e[k]\X,Y}\X  =  x}+  7(M  +  N) 

^  f  [  Oi+Xi  -hY-p-1  ]  ^  1 

-E  <  max  - - - -  X  =  x>, 

(i=i,...,fc  [oj +  +ni  +  mi  -  2J  J 

where  M  =  mi  +  ...  +  mfc.  Since  the  first  term  in  (30),  i.e.  the  iterated  conditional 
expectation,  is  simply  £?{©(*]  |2C  =  ^[*]  could  be  replaced  by  1  in  (28)  without  changing 

any  result  in  this  section.  The  following  is  seen  now  to  hold. 


(30) 


Theorem  2.  At  2L  =  ”  worthwhile  taking  these  additional  observations  Y\, 

if 


■  ,n, 


(31) 


max 


Oi  +  -  P  -  1 


+  7M  < 


[ai+  l3i  +  ni-2\ 

E  <  max  - - - -  2C  =  £  > 

ai  +  +  Ui  +  rui  -  2  J 
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This  result  can  be  used  in  several  ways  depending  on  the  sampling  scheme  adopted.  First, 
one  could  search  through  all  possible  m  =  (mi, . . . ,  m*)  to  determine  whether  it  is  worth 
at  all  taking  more  observations.  This  fixed  sample  size  look  ahead  procedure  is  due  to 
Amster  (1963)  and  discussed  in  Berger  (1985).  It  is  useful  in  situations  like  the  present 
one  where  a  fully  sequential  Bayes  procedure  is  not  feasible.  Second,  if  M  =  mi  -)-...  is 
fixed  predetermined,  one  could  find  the  optimum  allocation  m*,  say,  which  maximizes  the 
conditional  expectation  shown  in  the  theorem  and  go  ahead  with  additional  observations 
using  allocation  if  the  inequality  is  met.  This  procedure  can  be  called  an  adaptive  look 
ahead  M  procedure.  Other  possible  applications  of  Theorem  2  axe  reasonable  but  omitted 
for  brevity. 

All  that  is  needed  to  find  these  procedures  is  the  conditional  distribution  of  Y.,  given 

X  =  Since  apriori  ©i, . . . ,  0*  are  independent,  we  have 

k 

(32)  P{Z  =  y|X  =  x}  =  H  P{Yi  =  yi\Xi  =  x.}, 

»=i 

and  the  conditioned  distribution  of  Yi,  given  X,  =  is  the  same  as  the  marginal  distri¬ 
bution  of  Yi  with  respect  to  the  “updated”  prior  Be{ai  +  z,-,  A  +  ”»  ~  Xi),  i.e.  in  view  of 

(11),  for  i  =  1, . . . ,  k,  it  follows  that 
P{Yi  =  yi\Xi  =  x,}  = 

(3-^)  / m A  r(ai  -f  A  +  n,)r(Qi  -b  Zi  -f  yi)T{0i  +  ni-Xi  +  mi-  y,) 

\yi )  r{ai  +  Xi)r{0i+ni-Xi)T{ai+ 0i  +  ni  +  mi) 

where  Zj  6  {0,1,..., n;}  and  yi  €  {0, 1, . . .  ,m,}.  This  cein  be  used  quite  easily  in  a 
computer  program  to  evaluate  the  conditioned  expectation  in  the  criterion  given  in  Theorem 
2.  An  upper  bound  to  the  latter  is  provided  by  replacing  Vi  by  mi,  i  =  1, ...  ,k,  in  it. 
Thus,  the  search  through  all  possible  m  in  the  first  described  fixed  sample  size  look  ahead 
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procedure  is  actually  limited  to  a  finite,  typically  small,  collection  of  m’s,  as  the  cost  of 
additional  sampling,  i.e.  7M,  becomes  prohibitive  as  M  increases. 


To  conclude  this  section,  let  us  continue  the  treatment  of  our  numerical  example 
considered  at  the  end  of  the  previous  section.  For  at  =  0i  =  1,  i  =  1, . . .  ,k,  (33)  can  be 
written  as 


(34)  P{Y,  =  =  xj  =  ("‘)  ("“)/("•  +  ”••) , 

Tii  +  mi  +  l  \XiJ  \  yi  J  !  V  +  Vi  ) 


which  can  be  computed  with  a  subroutine  that  provides  hypergeometric  probabilities. 


For  1  <  <  5,  t  =  1,2, 3,  p  —  7,  and  7  =  0.001,  the  inequality  (31)  is  achieved  for 

(mi,  m2, m3)  equal  to  the  following  configurations:  (1,1,3),  (1,1,4),  (1,1,5),  (1,2,4),  (1,2,5), 
(1,3,5),  (2,1,4),  (2,1,5),  (2,2,5),  and  (3,1,5).  The  largest  difference  between  the  right  hand 
and  the  left  hand  side  of  (31)  occurs  at  (mi, m2, m3)  =  (1,1,5). 


In  the  same  setting,  if  p  is  replaced  hy  p  —  1.9,  i.e.  if  emphasis  is  shifted  away  from 
estimation  toward  selection,  then  the  inequality  (31)  is  achieved  at  the  (mi, m2, m3 )- 
configurations  (1,5,1),  (5,3,1),  (5,4,1),  (5,4,2),  and  (5,5,1).  The  largest  difference  of  both 
sides  of  (31)  occurs  this  time  at  (mi, m2, m3)  =  (5,5,1). 
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